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ABSTRACT: 

An  important  goal  of  computational  linguistics  has  been  to  use  linguistic  theory 
to  guide  the  construction  of  computationally  efficient  real-world  natural  lan¬ 
guage  processing  systems.  At  first  glance,  generalized  phrase  structure  grammar 
(GPSG)  appears  to  be  a  blessing  on  two  counts.  First,  the  precise  formalisms 
of  GPSG  might  be  a  direct  and  transparent  guide  for  parser  design  and  imple¬ 
mentation.  Second,  since  GPSG  has  wealt  context-free  generative  power  and 
context-free  languages  can  be  parsed  in  by  a  wide  range  of  algorithms, 
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time  hard,  and  assuredly  intractable.  The  paper  pinpoints  sources  of  complex¬ 
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1  Introduction 

An  important  goal  of  computational  linguistics  has  been  to  use  linguistic  the¬ 
ory  to  guide  the  construction  of  computationally  efficient  real-world  natural 
language  processing  systems.  Generalized  Phrase  Structure  Grammar  (GPSG) 
linguistic  theory  holds  out  considerable  promise  as  an  aid  in  this  task.  The 
precise  formalisms  of  GPSG  offer  the  prospect  of  a  direct  and  transparent  guide 
for  parser  design  and  implementation.  Furthermore,  and  more  importantly, 
GPSG's  weak  context-free  generative  power  suggests  an  efficiency  advantage 
for  GPSG-based  parsers.  Since  context-free  languages  can  be  parsed  in  poly¬ 
nomial  time,  it  seems  plausible  that  GPSGs  can  also  be  parsed  in  polynomial 
time.  This  would  in  turn  seem  to  provide  “the  beginnings  of  an  explanation  for 
the  obvious,  but  largely  ignored,  fact  that  humans  process  the  utterances  they 
hear  very  rapidly  (Gazdar, 1981:155). Bl 

In  this  paper  I  argue  that  the  expectations  of  the  informal  complexity  argu¬ 
ment  from  weak  context-free  generative  power  are  not  in  fact  met.  I  begin  by 
examining  the  computational  complexity  of  metarules  and  the  feature  system 
of  GPSG  and  show  that  these  systems  can  lead  to  computational  intractability. 
Next  I  prove  that  the  universal  recognition  problem  for  current  GPSG  theory  is 
Exp-Poly  hard,  and  assuredly  intractable.2  That  is,  the  problem  of  determining 
for  an  arbitrary  GPSG  G  and  input  strmg  *  whether  x  is  in  the  language  L(G) 
generated  by  G,  is  exponential  polynomial  time  hard.  This  result  puts  GPSG- 
Recognition  in  a  complexity  class  occupied  by  few  natural  problems:  GPSG- 
Recognition  is  harder  than  the  traveling  salesman  problem,  context-sensitive 
language  recognition,  or  winning  the  game  of  Chess  on  an  n  x  n  board.  The 
complexity  classification  shows  that  the  fastest  recognition  algorithm  for  GPSGs 
must  take  exponential  time  or  worse.  One  role  of  a  computational  analysis  is  to 
provide  formal  insights  into  linguistic  theory.  To  this  end,  this  paper  pinpoints 
sources  of  complexity  in  the  current  GPSG  theory  and  concludes  with  some 
linguistically  and  computationally  motivated  restrictions. 


2  Complexity  of  GPSG  Components 

A  generalized  phrase  structure  grammar  contains  five  language-particular  com¬ 
ponents  —  immediate  dominance  (ID)  rules,  metarules,  linear  precedence  (LP) 

*See  also  Joshi,  “Tree  Adjoining  Grammars"  p.226,  in  Natural  Language  Parting  (1985)  ed. 
by  D.  Dowty,  L.  Karttunen,  and  A.  Zwicky,  Cambridge  University  Press:  Cambridge,  and 
'Exceptions  to  the  Rule,"  Science  Newe  128:  314-315. 

aWe  use  the  universal  problem  to  more  accurately  explore  the  power  of  a  grammatical 
formalism  (see  section  3.1  below  for  support).  Ristad(1985)  has  previously  proven  that  the 
universal  recognition  problem  for  the  GPSG’s  of  Gaidar(1981)  is  NP-hard  and  likely  to  be 
intractable,  even  under  severe  metarule  restrictions. 
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statements,  feature  cooccurrence  restrictions  (FCRs),  and  feature  specification 
defaults  (FSDs)  —  and  four  universal  components  —  a  theory  of  syntactic  fea¬ 
tures,  principles  of  universal  feature  instantiation,  principles  of  semantic  inter¬ 
pretation,  and  formal  relationships  among  various  components  of  the  grammar.3 

Syntactic  categories  are  partial  functions  from  features  to  atomic  feature 
values  and  syntactic  categories.  They  encode  subcategorization,  agreement, 
unbounded  dependency,  and  other  significant  syntactic  information.  The  set  K 
of  syntactic  categories  is  inductively  specified  by  listing  the  set  F  of  features, 
the  set  A  of  atomic  feature  values,  the  function  p  that  defines  the  range  of 
each  atomic-valued  feature,  and  a  set  R  of  restrictive  predicates  on  categories 
(FCRs). 

The  set  of  ID  rules  obtained  by  taking  the  finite  closure  of  the  metarules  on 
the  ID  rules  is  mapped  into  local  phrase  structure  trees,  subject  to  principles 
of  universal  feature  instantiation,  FSDs,  FCRs,  and  LP  statements.  Finally, 
local  trees  are  assembled  to  form  phrase  structure  trees,  which  are  terminated 
by  lexical  elements. 

To  identify  sources  of  complexity  in  GPSG  theory,  we  consider  the  isolated 
complexity  of  the  finite  metarule  closure  operation  and  the  rule  to  tree  map¬ 
ping,  using  the  finite  closure  membership  and  category  membership  problems, 
respectively.  Informally,  the  finite  closure  membership  problem  is  to  determine 
if  an  ID  rule  is  in  the  finite  closure  of  a  set  of  metarules  M  on  a  set  of  ID  rules 
R.  The  category  membership  problem  is  to  determine  if  a  category  Cora  legal 
extension  of  C  is  in  the  set  K  of  all  categories  based  on  the  function  p  and  the 
sets  A,  F  and  R.  Note  that  both  problems  must  be  solved  by  any  GPSG-based 
parsing  system  when  computing  the  ID  rule  to  local  tree  mapping. 

The  major  results  are  that  finite  closure  membership  is  NP-hard  and  cat¬ 
egory  membership  is  PSPACE-hard.  Barton(1985)  has  previously  shown  that 
the  recognition  problem  for  ID/LP  grammars  is  NP-hard.  The  components  of 
GPSG  theory  are  computationally  complex,  as  is  the  theory  as  a  whole. 


Assumptions.  In  the  following  problem  definitions,  we  allow  syntactic  cate¬ 
gories  to  be  based  on  arbitrary  sets  of  features  and  feature  values.  In  actuality, 
GPSG  syntactic  categories  are  based  on  fixed  sets  and  a  fixed  function  p.  As 
such,  the  set  K  of  permissible  categories  is  finite,  and  a  large  table  containing 
K  could,  in  principle,  be  given.4  We  generalize  to  arbitrary  sets  and  an  arbi¬ 
trary  function  p  to  prevent  such  a  solution  while  preserving  GPSG’s  theory  of 

®This  work  i«  based  on  current  GPSG  theory  as  presented  in  Gaidar  et.  al.  (1985), 
hereafter  GKPS.  The  reader  is  urged  to  consult  that  work  for  a  formal  presentation  and 
thorough  exposition  of  current  GPSG  theory. 

4This  suggestion  is  of  no  practical  significance,  because  the  actual  number  of  GPSG  syntac¬ 
tic  categories  is  extremely  large.  The  total  number  of  categories,  given  the  25  atomic  features 
and  4  category-valued  features,  is: 
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syntactic  features.6  No  other  modifications  to  the  theory  are  made. 

An  ambiguity  in  GKPS  is  how  the  FCRs  actually  apply  to  embedded  categories.0 
Following  Ivan  Sag  (personal  communication),  I  make  the  natural  assumption 
here  that  FCRs  apply  top-level  and  to  embedded  categories  equally. 


2.1  Metarules 

The  complete  set  of  ID  rules  in  a  GPSG  is  the  maximal  set  that  can  be  arrived 
at  by  taking  each  metarule  and  applying  it  to  the  set  of  rules  that  have  not 
themselves  arisen  as  a  result  of  the  application  of  that  metarule.  This  maximal 
set  is  called  the  finite  closure  {FC)  of  a  set  R  of  lexical  ID  rules  under  a  set  M 
of  metarules. 

The  cleanest  possible  complexity  proof  for  metarule  finite  closure  would  fix 
the  GPSG  (with  the  exception  of  metarules)  for  a  given  problem,  and  then 
construct  metarules  dependent  on  the  problem  instance  that  is  being  reduced. 
Unfortunately,  metarules  cannot  be  cleanly  removed  from  the  GPSG  system. 
Metarules  take  ID  rules  as  input,  and  produce  other  ID  rules  as  their  output. 
If  we  were  to  separate  metarules  from  their  inputs  and  outputs,  there  would  be 
nothing  left  to  study. 

The  best  complexity  proof  for  metarules,  then,  would  fix  the  GPSG  mod¬ 
ulo  the  metarules  and  their  input.  We  ensure  the  input  is  not  inadvertently 
performing  some  computation  by  requiring  the  one  ID  rule  R  allowed  in  the  re¬ 
duction  to  be  fully  specified,  with  only  one  0-level  category  on  the  left-hand  side 
and  one  unanalysable  terminal  symbol  on  the  right-hand  side.  Furthermore,  no 
FCRs,  FSDs,  or  principles  of  universal  feature  instantiation  are  allowed  to  ap¬ 
ply.  These  are  exceedingly  severe  constraints.  The  ID  rules  generated  by  this 
formal  system  will  be  the  finite  closure  of  the  lone  ID  rule  R  under  the  set  M 


\K  =  K*\  =  33S((1  +  33S)((1  +  335)((1  +  335)(1  +  335)1)3)3)4 

=  3SS(1  +  3s5)64  >  31®3*  >  io7T& 


See  page  10  for  details.  Many  of  these  categories  will  be  linguistically  meaningless,  but  all 
GPSGs  will  generate  all  of  them  and  then  filter  some  out  in  consideration  of  FCRs,  FSDs, 
universal  feature  instantiation,  and  the  other  admissible  local  trees  and  lexical  entries  in  the 
GPSG.  While  the  FCRs  in  some  grammars  may  reduce  the  number  of  categories,  FCRs  are 
a  language-particular  component  of  the  grammar.  The  vast  number  of  categories  cited  above 
is  mhennt  in  the  GPSG  framework. 

5Our  goal  is  to  identify  sources  of  complexity  in  GPSG  theory.  The  generalisation  to 
arbitrary  sets  allows  a  fine-grained  study  of  one  component  of  GPSG  theory  (the  theory  of 
syntactic  features)  with  the  tools  of  computational  complexity  theory.  Similarly,  the  chess 
board  is  uncontroversially  generalised  to  sise  sxn  in  order  to  study  the  computational 
complexity  of  chess. 

®A  category  C  that  is  defined  for  a  feature  /,  /  6  (F  -  Aton)  n  DOM(C)  (e  g.  /  =  SLASH  ), 
contains  an  embedded  category  C«,  where  C{})  =  C,.  GKPS  does  not  explain  whether  FCR’s 
must  be  true  of  Ce  as  well  as  C . 
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of  metarules. 

The  (strict,  resp.)  finite  closure  membership  problem  for  GPSG  metarules 
is:  Given  an  ID  rule  r  and  sets  of  metarules  M  and  ID  rules  R,  determine  if  3r‘ 
such  that  r'  3  r  (r'  =  r,  resp.)  and  r'  6  FC(M,  R). 

Theorem  1  Finite  Closure  Membership  is  NP-hard 

Proof:  On  input  3-CNF  formula  F  of  length  n  using  the  m  variables  ii . . .  xm, 
reduce  3-SAT,  a  known  NP-complete  problem,  to  Metarule-Membership  in  poly¬ 
nomial  time. 

The  set  of  ID  rules  consists  of  the  one  ID  rule  R,  whose  mother  category 
represents  the  formula  variables  and  clauses,  and  a  set  of  metarules  M  s.t.  an 
extension  of  the  ID  rule  A  is  in  the  finite  closure  of  M  over  R  iff  F  is  satisfiable. 
The  metarules  generate  possible  truth  assignments  for  the  formula  variables,  and 
then  compute  the  truth  value  of  F  in  the  context  of  those  truth  assignments. 

Let  w  be  the  string  of  formula  literals  in  F,  and  let  to,  denote  the  ith  symbol 
in  the  string  to. 

1.  The  ID  rules  R,  A 

R  :  F  — ►  <satisf iability> 

A  :  [[STAGE  3]|  —  Oatisf  iable> 

where 

<satisfiable>  is  a  terminal  symbol 
<satisfiability>  is  a  terminal  symbol 
F  =  {[y.  0) :  1  <  *  <  m) 

U  {[c.  0] :  1  <  t  <  1^} 

U  {[STAGE  1]} 


2.  Construct  the  metarules 

(a)  m  metarules  to  generate  all  possible  assignments  to  the  variables 

Vt  ,  I  <  t  <  m 

{[y>  0],  [STAGE  1]}  — »  W 

{[y,  lj,  [STAGE  1]}  -  W 

(b)  one  metarule  to  stop  the  assignment  generation  process 


2  COMPLEXITY  OF  GPSG  COMPONENTS 


5 


{[STAGE  1]}  -*  W 

^  (2) 
{[STAGE  2]}  —  W 

(c)  |u>  |  metarules  to  verify  assignments 

Vi,  j,  k  1  <  *■  <  1|1,  1  <  y  <  m,  0  <  it  <  2, 

if  u’3 i-k  =  *y,  then  construct  the  metarule 

{[»y  l],[e<  0],  [STAGE  2)}  -  W 

*  (3) 

{[Vi  1],  [c,-  1],  [STAGE  2\}-^W 

Vi,  j,  k  1  <  i  <  M  1  <  y  <  m,  o  <  it  <  2, 

if  u>3j_fc  =  2EJ,  then  construct  the  metarule 

{[y;  0],|Ci  0],  [STAGE  2]}  -  W 

^  (4) 

{[yy  0],  jc,  1],  [STAGE  2))^W 

(d)  Let  the  category  C  =  {[c,  l]  :  1  <  i  <  ^}.  Construct  the  metarule 

CjSTAGE  2 )-*W 

*  (5) 

{[STAGE  3)}  —  <sati8fiable> 

The  reduction  constructs  0(|  w  |)  metarules  of  size  log(|  w  |),  and  clearly  may 
be  performed  in  polynomial  time:  the  reduction  time  is  essentially  the  number 
of  symbols  needed  to  write  down  the  GPSG.  Note  that  the  strict  finite  closure 
membership  problem  is  also  NP-hard.  One  need  only  add  a  polynomial  number 
of  metarules  to  “change”  the  feature  values  of  the  mother  node  C  to  some 
canonical  value  when  C(STAGE  )  =  3  —  all  0,  for  example,  with  the  exception 
of  STAGE  .  Let  F  =  {[y<  0)  :  1  <  i  <  m}  U  {[c,  0]  :  1  <  i  <  l£i}.  Then  A 
would  be 


A  :  F(STAGE  3]  — * <satisf iable> 


Q.e.D 

The  major  source  of  intractability  is  the  finite  closure  operation  itself.  In¬ 
formally,  each  metarule  can  more  than  double  the  number  of  ID  rules,  hence  by 
chaining  metarules  (i.e.  by  applying  the  output  of  a  metarule  to  the  input  of  the 
next  metarule)  finite  closure  can  increase  the  number  of  ID  rules  exponentially.7 

7More  precisely,  the  metarule  finite  closure  operation  can  increase  the  sire  of  a  GPSG  G 
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2.2  A  Theory  of  Syntactic  Features 

Here  we  show  that  the  complex  feature  system  employed  by  GPSG  leads  to 
computational  intractability.  The  underlying  insight  for  the  following  complex¬ 
ity  proof  is  the  almost  direct  equivalence  between  Alternating  Turing  Machines 
(ATMs)  and  syntactic  categories  in  GPSG.  The  nodes  of  an  ATM  computa¬ 
tion  correspond  to  O-level  syntactic  categories,  and  the  ATM  computation  tree 
corresponds  to  a  full,  n-level  syntactic  category.  The  finite  feature  closure  re¬ 
striction  on  categories,  which  limits  the  depth  of  category  nesting,  will  limit 
the  depth  of  the  corresponding  ATM  computation  tree.  Finite  feature  closure 
constrains  us  to  specifying  (at  most)  a  polynomially  deep,  polynomially  branch¬ 
ing  tree  in  polynomial  time.  This  is  exactly  equivalent  to  a  polynomial  time 
ATM  computation,  and  by  Chandra  and  Stockmeyer(1976),  also  equivalent  to 
a  deterministic  polynomial  space-bounded  Turing  Machine  computation. 

As  a  consequence  of  the  above  insight,  one  would  expect  the  GPSG  Category- 
Membership  problem  to  be  PSPACE-hard.  The  actual  proof  is  considerably 
simpler  when  framed  as  a  reduction  from  the  Quantified  Boolean  Formula  (QBF) 
problem,  a  known  PSPACE-complete  problem. 

Definition: 

The  QBF  problem  is  defined  as  follows.  “Quantified  Boolean  formulas  (QBF) 
are  built  from  variables,  the  operators  V,  A,  and  parentheses,  and  the  quan¬ 
tifiers  3  (‘there  exists’)  and  V  (‘for  all’).  When  defining  QBF’s  recursively,  we 
find  it  useful  simultaneously  to  define  free  occurrences  of  variables  (occurrences 
to  which  no  quantifier  applies),  and  the  scope  of  a  quantifier  (those  occurrences 
to  which  the  quantifier  applies). 

1.  If  x  is  a  variable,  then  it  is  a  QBF.  The  occurrence  of  i  is  free. 

2.  If  Ei  and  E 2  are  QBF’s,  so  are  — >(£7i ),  (Ei)  V  (E?),  and  (Ei)  A  (£3).  An 
occurrence  of  x  is  free  or  bound,  depending  on  whether  the  occurrence  is 
free  or  bound  in  E\  or  E 2.  Redundant  parentheses  can  be  omitted. 

3.  IF  E  is  a  QBF,  then  3 x{E)  and  'ix(E )  are  QBF’s.  The  scopes  of  3x  and 
Vx  are  all  free  occurrences  of  x  in  E.  (Note  that  there  may  also  be  bound 
occurrences  of  x  in  E\  theses  are  not  part  of  the  scope.)  Free  occurrences 
of  x  in  E  are  bound  in  3x(2?)  and  'ix(E).  All  other  occurrences  of  variables 

worw  than  exponentially:  from  |G|  to  0(|G|3^).  Given  a  set  of  ID  rule*  R  of  symbol  sire  n, 
and  a  *et  M  of  m  metarule,  each  of  sise  p,  the  *ymbol  sire  of  FC(M,R)'n  0(n2m)  =  C>(|G  |3^). 
Each  metarule  can  match  the  productions  in  R  O(n)  different  way*,  inducing  0(n+  p)  new 
symbol*  per  match:  each  metarule  can  therefore  square  the  ID  rule  grammar  site.  There  are 
m  metarules,  so  finite  closure  can  create  an  ID  rule  grammar  with  0(n3m)  symbols. 
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in  E  are  free  or  bound,  depending  on  whether  they  are  free  or  bound  in 
E. 

“A  QBF  with  no  free  variables  has  a  value  of  either  true  or  false,  which 
we  denote  by  the  Boolean  constants  1  and  0.  The  value  of  such  a  QBF  is 
determined  by  replacing  each  subexpression  of  the  form  Bz(£)  by  Eq  A  Ei  and 
each  subexpression  of  the  form  Vx(E)  by  E0  V  Elt  where  E0  and  Ei  are  E 
with  all  occurrences  of  x  in  the  scope  of  the  quantifier  replaced  by  0  and  1, 
respectively  (Hopcraft  and  Ullman, 1979:343-344).” 

The  QBF  problem  is  {QiyiQtyi  •  •  •  QmymF(yi,  y2,  ■  ■  ■ ,  ym)  |  Qi  e  {V,3}, 
where  the  t/<  are  boolean  variables,  F  is  a  boolean  formula  of  length  n  in  con¬ 
junctive  normal  form  with  exactly  three  variables  per  clause  (3-CNF),  and  the 
quantified  formula  is  true}. 

Let  a  specification  oj  K  be  the  arbitrary  sets  of  features  F,  atomic  features 
Atom,  atomic  feature  values  A,  and  feature  co-occurrence  restrictions  R  and  let 
p  be  an  arbitrary  function,  all  equivalent  to  those  defined  in  chapter  2  of  GKPS. 
The  category  membership  problem  is:  Given  a  category  C  and  a  specification  of 
a  set  K  of  syntactic  categories,  determine  if  3 C'  s.t.  C'  □  C  and  C'  G  K. 

Theorem  2  GPSG  Category-Membership  is  PSPACE-hard 

Proof:  By  reduction  from  QBF.  On  input  formula 


fi  =  QiyiQiy?  ■  ■  ■  QmymF(yi,y2,  ■■■,ym) 


we  construct  an  instance  P  of  the  Category-Membership  problem  in  poly¬ 
nomial  time,  such  that  fi  €  QBF  if  and  only  if  P  is  true. 

Consider  the  QBF  as  a  strictly  balanced  binary  tree,  where  the  ith  quantifier 
Qi  represents  pairs  of  subtrees  <  Tt,  Tf  >  such  that  (1)  Tt  and  Tj  each  immedi¬ 
ately  dominate  pairs  of  subtrees  representing  the  quantifiers  Qi+i  ■■■Qm ,  and 
(2)  the  Ith  variable  t/j  i3  true  in  Tt  and  false  in  Tj.  All  nodes  at  level  t  in  the 
whole  tree  correspond  to  the  quantifier  Qi.  The  leaves  of  the  tree  are  differ¬ 
ent  instantiations  of  the  formula  F,  corresponding  to  the  quantifier-determined 
truth  assignments  to  the  m  variables.  A  leaf  node  is  labeled  true  if  the  instan¬ 
tiated  formula  F  that  it  represents  is  true.  An  internal  node  in  the  tree  at  level 
t  is  labeled  true  if 

1.  Qi  =  “3”  and  either  daughter  is  labeled  true,  or 

2.  Qi  =  “V”  and  both  daughters  are  labeled  true. 


jlJ  * 


i  aJ\  A-VmJl 


Hj dj\. 
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Otherwise,  the  node  is  labeled  false. 

Similarly,  categories  can  be  understood  as  trees,  where  the  features  in  the 
domain  of  a  category  constitute  a  node  in  the  tree,  and  a  category  C  immediately 
dominates  all  categories  C'  such  that  3/  €  (( F  —  Atom)  n  D0M(C))[C(/)  =  C'\. 

In  the  QBF  reduction,  the  atomic-valued  features  are  used  to  represent  the  m 
variables,  the  clauses  of  F,  the  quantifier  the  category  represents,  and  the  truth 
label  of  the  category.  The  category-valued  features  represent  the  quantifiers  — 
two  category-valued  features  qk>q'k  represent  the  subtree  pairs  <  Tt)Tj  >  for 
the  quantifier  Qk-  FCRs  maintain  quantifier- imposed  variable  truth  assignments 
“down  the  tree”  and  calculate  the  truth  labeling  of  all  leaves,  according  to  F, 
and  internal  nodes,  according  to  quantifier  meaning. 


Details.  Let  to  be  the  string  of  formula  literals  in  F,  and  u\  denote  the  ith 
symbol  in  the  string  to.  We  specify  a  set  K  of  permissible  categories  based  on 
A,  F,  p,  and  the  set  of  FCRs  R  s.t.  the  category  [[LABEL  lj]  or  an  extension  of 
it  is  an  element  of  K  iff  fi  is  true. 

First  we  define  the  set  of  possible  O-level  categories,  which  encode  the  formula 
F  and  truth  assignments  to  the  formula  variables.  The  feature  to,  represents 
the  formula  literal  to*  in  to,  yy  represents  the  variable  yy  in  fi,  and  c,  represents 
the  truth  value  of  the  ith  clause  in  F. 

Atom  =  {LEVEL  .LABEL  } 

U  {to,- :  1  <  i  <|to|} 
u  {y>  :  1  <  3  <  rn} 
u  {C,  :  1  <  *•  < 

F  —  Atom  =  {gfc,  q'k  :  1  <  k  <  m} 
p° (LEVEL )  =  {fc  :  1  <  fc  <  m  +  1} 
p0(/)  =  {0,1}  V/  G  Atom- {LEVEL  } 

FCR’s  are  included  to  constrain  both  the  form  and  content  of  the  guesses: 

1.  FCR’s  to  create  strictly  balanced  binary  trees: 

Vfc,  1  <  k  <  m  , 

[LEVEL  fc|  =  [qk  [[yfc  IJjLEVEL  +  l]]|t[?'  \\yk  0][LEVEL  k  +  1|[] 

2.  FCR’s  to  ensure  all  0-level  categories  are  fully  specified: 
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Vi,  1  <  t  <  f  , 

[c,]  =  [tU3<_2]&[u;3i_x)&[t(;3i] 

[LABEL  ]  =  [Ci] 

Vfc,  l  <  k  <  m  , 

[LABEL  ]  =  [yjtj 

3.  FCR’s  to  label  internal  nodes  with  truth  values  determined  by  quantifier 
meaning: 

VA:,  1  <  k  <  m  , 

if  Qk  =  “V”,  then  include: 

[LEVEL  fcj&[LABEL  l]  =  [qk  [[LABEL  l]]]&[<^  [[LABEL  1]]] 

[LEVEL  A:]  &  [LABEL  0]  =  j9fc  [[LABEL  0]]]  V  [q'k  [[LABEL  0]]] 

otherwise  Qk  =  “3”,  and  include: 

[LEVEL  Aj&jLABEL  1]  =  [qk  [[LABEL  1]]]  V  [q'k  [[LABEL  1]]] 

[LEVEL  fc]&[LABEL  0]  =  \qk  [[LABEL  0]]]&[^  [[LABEL  0]]] 

The  category-valued  features  qk  and  q'k  represent  the  quantifier  Qk.  In 
the  category  value  of  qk,  the  formula  variable  yk  —  1  everywhere,  while  in 
the  category  value  of  q'J.,  t/fc  —  0  everywhere. 

4.  one  FCR  to  guarantee  that  only  satisfiable  assignments  are  permitted: 

[LEVEL  1]  D  [LABEL  1] 

5.  FCR’s  to  ensure  that  quantifier  assignments  are  preserved  “down  the  tree” : 

Vi,  A:  1  <  i  <  A:  <  m  , 

[y<  ![  D  I?*  |[y<  i]ll&[9*  [[v«  i]]l 

(y.  0]  3  [?fc  [(y.  0|]]&[g£  [[y,  o]]j 

6.  FCR’s  to  instantiate  variable  assignments  into  the  formula  F: 

Vi,  A:  1  <  i  <|ty|  and  1  <  k  <  m  , 
if  Wi  =  yk,  then  include: 

[y*  !|  [«»»  1] 

[yfc  Oj  D  [tv,  0] 

else  if  Wi  =  y]f,  then  include: 

(y*r  i]  d  [u/j  o) 

[y*:  o[  d  [u»,  i| 
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7.  FCR’s  to  verify  the  guessed  variable  assignments  in  leaf  nodes: 

Vi  1  <  t  <  $  , 

[Ci  0]  =  (tU3i_2  0]&[tt>3i-l  0]4£[tU3i  0) 

[Ci  1]  =  [tu3j_2  l)v[u»3i_i  l]v[tn3<  1] 

[LEVEL  m  +  l|fc[c,-  0]  D  [LABEL  Oj 

[LEVEL  m+l]t[c!  l]&[c2  l]&  - . .  &[cw/3l]  D  [LABEL  l) 

The  reduction  constructs  0(|tu[)  features  and  0(m2)  FCRs  of  sise  0(log  m) 
in  a  simple  manner,  and  consequently  may  be  seen  to  be  polynomial  time. 

q.s.p 

The  primary  source  of  intractability  in  the  theory  of  syntactic  features  is 
the  large  number  of  possible  syntactic  categories  (arising  from  finite  feature 
closure)  in  combination  with  the  computational  power  of  feature  co-occurrence 
restrictions.8  FCRs  of  the  “disjunctive  consequence*  form  [/  v]  D  \fi  ui]  V 
. . .  V  [/„  vn]  compute  the  direct  analogue  of  Satisfiability:  when  used  in  con¬ 
junction  with  other  FCRs,  the  GPSG  effectively  must  try  all  n  feature-value 
combinations. 


3  Complexity  of  GPSG-Recognition 

Two  isolated  membership  problems  for  GPSG’s  component  formal  devices  were 
considered  above  in  an  attempt  to  isolate  sources  of  complexity  in  GPSG  the¬ 
ory.  In  this  section  the  recognition  problem  (RP)  for  GPSG  theory  as  a  whole 
is  considered.  I  begin  by  arguing  that  the  linguistically  and  computationally 
relevant  recognition  problem  is  the  universal  recognition  problem,  as  opposed 
to  the  fixed  language  recognition  problem.  I  then  show  that  the  former  problem 
is  exponential-polynomial  (Exp-Poly)  time-hard. 

'Finite  feature  closure  admits  a  surprisingly  large  number  of  possible  categories.  Given  a 
specification  (P,  it  on.  A,  R,  p)  at  K,  let  a  =|  item  |  and  i  =  |  ¥  -  itsm|.  Assume  that  all  atomic 
features  are  binary:  a  feature  may  be  or  undefined  and  there  are  3*  0- level  categories. 
The  i  category-valued  features  may  each  assume  0(3*)  possible  values  in  a  1-level  category, 
so  |ffl|=  0(3*(3*)*).  More  generally, 

\K  =  Kb  |=  O(3(o^— o  n^TTT’)  =  o(3(a*'Z),_o  =  o(3“'"”)  =  0(3*  M) 

where  X/*=o  if  converges  to  e  2.7  very  rapidly  and  a,  i  =  0(|G|);  a  =  25,  t  =  4  in  GKPS. 
The  smallest  category  in  K  will  be  1  symbol  (null  set),  and  the  largest,  maximally-specified, 
category  will  be  of  symbol-site  log  \  K\—  O(o  ■  6!). 
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3.1  Defining  the  Recognition  Problem 

The  universal  recognition  problem  is:  given  a  grammar  G  and  input  string  x, 
is  x  €  L(G)?.  Alternately,  the  recognition  problem  for  a  class  of  grammars 
may  be  defined  as  the  family  of  questions  in  one  unkown.  This  fixed  language 
recognition  problem  is:  given  an  input  string  z,  is  z  E  L  for  some  fixed  language 
LI.  For  the  fixed  language  RP,  it  does  not  matter  which  grammar  is  chosen  to 
generate  L  —  typically,  the  fastest  grammar  is  picked. 

It  seems  reasonable  clear  that  the  universal  RP  is  of  greater  linguistic  and 
engineering  interest  than  the  fixed  language  RP.  The  grammars  licensed  by 
linguistic  theory  assign  structural  descriptions  to  utterances,  which  are  used  to 
query  and  update  databases,  be  interpreted  semantically,  translated  into  other 
human  languages,  and  so  on.  The  universal  recognition  problem  —  unlike  the 
fixed  language  problem  —  determines  membership  with  respect  to  a  grammar, 
and  therefore  more  accurately  models  the  parsing  problem,  which  must  use  a 
grammar  to  assign  structural  descriptions. 

The  universal  RP  also  bears  most  directly  on  issues  of  natural  language 
acquisition.  The  language  learner  evidently  possesses  a  mechanism  for  selecting 
grammmars  from  the  class  of  learnable  natural  language  grammars  Lg  on  the 
basis  of  linguistic  inputs.  The  more  fundamental  question  for  linguistic  theory, 
then,  is  “what  is  the  recognition  complexity  of  the  class  £g?”.  If  this  problem 
should  prove  computationally  intractable,  then  the  (potential)  tractability  of 
the  problem  for  each  language  generated  by  a  G  in  the  class  is  only  a  partial 
answer  to  the  linguistic  questions  raised. 

Finally,  complexity  considerations  favor  the  universal  RP.  The  goal  of  a 
complexity  analysis  is  to  characterize  the  amount  of  computational  resources 
(e.g.  time,  space)  needed  to  solve  the  problem  in  terms  of  all  computation¬ 
ally  relevent  inputs  on  some  standard  machine  model  (typically,  a  multi-tape 
deterministic  Turing  machine).  We  know  that  both  input  string  length  and 
grammar  size  and  structure  affect  the  complexity  of  the  recognition  problem. 
Hence,  excluding  either  input  from  complexity  consideration  would  not  advance 
our  understanding.9 

Linguistics  and  computer  science  are  primarily  interested  in  the  universal 
recognition  problem  because  both  disciplines  are  concerned  with  the  formal 
power  of  a  family  of  grammars.  Linguistic  competence  and  performance  must 

®This  “consider  all  relevant  inputs”  methodology  is  universally  assumed  in  the  formal  lan¬ 
guage  and  computational  complexity  literature.  For  example,  Hopcraft  and  Ullman(1979:139) 
define  the  context-free  grammar  recognition  problem  as:  “Given  a  CFG  G  =  {V,T,P,S)  and 
a  string  i  in  T' ,  is  x  in  L(G)7".  Garey  and  Johnson(1979)  is  a  standard  reference  work  in 
the  field  of  computational  complexity.  All  10  automata  and  language  recognition  problems 
covered  in  the  book  (pp.  265-271)  are  universal,  i.e.  of  the  form  “Given  an  instance  of  a  ma¬ 
chine/grammar  and  an  input,  does  the  machine/ gram  mar  accept  the  input?”  The  complexity 
of  these  recognition  problems  is  alwayt  calculated  in  terms  of  grammar  and  input  size. 


3  COMPLEXITY  OF  GPSG-RECOGNITION 


12 


be  considered  in  the  larger  context  of  efficient  language  acquisition,  while  com¬ 
putational  considerations  demand  that  the  recognition  problem  be  characterised 
in  terms  of  both  input  string  and  grammar  sise.  Excluding  grammar  sise  from 
complexity  consideration  in  order  to  argue  that  the  recognition  problem  for  a 
family  of  grammars  is  tractable  is  akin  to  fixing  the  sise  of  the  chess  board 
in  order  to  argue  that  winning  the  game  of  chess  is  tractable:  neither  claim 
advances  our  scientific  understanding  of  chess  or  natural  language. 


3.2  GPS G- Recognition  is  Exp-Poly  hard 


The  following  proof  reduces  instances  of  a  polynomial-space  bounded  alternating 
Turing  Machine  computation  to  instances  of  GPSG-Recognition. 

Definition:10 


An  alternating  Turing  Machine  is  like  a  nondeterministic  TM,  except  that 
some  subset  of  its  states  will  be  referred  to  as  universal  states,  and  the  remainder 
as  existential  states.  A  nondeterministic  TM  is  an  alternating  TM  with  no 
universal  states. 

A  k-tape  alternating  Turing  Machine  (ATM)  is  a  10-tuple  M  =<  Q,  E,  T,  $,  #,  fc,  o,  q0,  F,U>, 
where 


Q,qo,F  = 
E,r 

*,# 

k 

6  C 

u 

Q-U  = 


set  of  states,  initial  state,  set  of  accepting  states,  respectively 
input,  tape  alphabets,  E  C  T 
endmarker,  blank  symbol;  $,  #  €  T  —  E 
number  of  read-write  tapes,  k  >  1 

(<?  x  rfc+1)  x  (Q  x  x  {L,  7?}fc+1)  is  the  next  move  relation 
set  of  universal  states,  U  c  Q 
set  of  existential  states 


The  TM  has  a  read-only  input  tape,  with  the  input  w  G  E*  written  as  $w$, 
and  the  reading  head  initialised  to  the  first  symbol  of  w.  The  k  work  tapes  are 
one-way  infinite,  and  are  initially  blank.  A  configuration  of  the  ATM  consists 
of  the  state,  head  positions,  and  contents  of  the  k  +  1  tapes.  One  move  of  the 
TM  consists  of  reading  one  symbol  from  the  input  tape  and  moving  the  heads 
left/right  as  allowed  by  6,  along  with  a  change  of  state  of  the  machine.  No 
move  is  allowed  if  the  head  on  the  input  tape  is  not  within  the  two  end  markers 
(inclusive).  If  C  is  a  configuration  of  M,  let  Nextj^(C)  be  the  set  of  possible 
configurations  after  one  move  of  M .  We  say  a  configuration  is  existential  (resp. 


10Taken  from  Chandra  and  Stockmeyer(1976),  with  the  restriction  that  the  work  tapes  are 
one-way  infinite,  instead  of  two-way  infinite. 


V. 


-O' 
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universal,  accepting)  if  the  state  of  the  TM  in  that  configuration  is  an  existential 
(resp.  universal,  accepting)  state. 

The  computation  of  an  alternating  TM  M  on  an  input  w  is  a  possibly  infinite 
tree  where  the  nodes  correspond  to  TM  configurations.  Technically,  each  node 
is  of  the  form  (x,  C)  where  x  £  {0,1,...}*,  and  C  is  the  configuration.  The 
root  is  («,  Co)  where  Co  is  the  initial  configuration,  and  the  sons  of  (x,  C)  are 

all  elements  (xt,C<)  where  Nextjw(C)  =  {Co,C\ . C*}.  The  criterion  for 

acceptance  is  as  follows.  Let  N  be  the  set  of  nodes  in  the  computation  tree 
of  TM  M  on  input  w.  We  label  the  nodes  of  the  tree  either  true  or  false 
according  to  the  following  criteria.  A  labeling  L  :  N  —*  {  true,  false  }  is  said 
to  be  acceptable  if 

1.  if  C  is  an  accepting  configuration 

L(x,  C)  =  true 

2.  if  C  is  an  existential  configuration 

L(x,  C)  =  \/  L(xi,C') 

c'eNextjutc?) 

3.  if  C  is  a  universal  configuration 

L(x,  C)  =  /\  L(xi,  C') 

c'eNextw(C) 

By  convention,  V  (resp.  of  an  empty  set  is  false  (resp.  true).  M  accepts 
the  input  w  if  and  only  if  L(t,  Co)  =  true  for  all  acceptable  labelings  L. 

Note  that  alternating  TMs  without  universal  states  accept  exactly  as  do  the 
corresponding  nondeterministic  TMs. 

Theorem  3  GPSG-Recognition  is  Exp-Poly  time-hard 

Proof:  By  direct  simulation  of  a  polynomial  space  bounded  alternating  Turing 
Machine  M  on  input  tu.11 

“Without  loss  of  generality,  we  use  a  1-tape  ATM,  so 

s  c  (q  x  r  x  r)  x  (<?  x  r  x  {l,  r)  x  {l,  r }) 
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Let  S(n)  be  a  polynomial  in  n.  Then,  on  inpnt  M,  a  S(n)  space-bounded  one 
tape  alternating  Turing  Machine  (ATM),  and  string  w,  we  construct  a  GPSG 
G  in  polynomial  time  such  that  w  e  L(M)  iff  I0u>iltt>a2...tu,,n$n+1  €  L(G). 

By  Chandra  and  Stockmeyer(1976), 

ASPAC£(S(n))  =  [j  DTIME(cs (n>) 

c>  0 

where  ASPACE(S(n))  is  the  class  of  problems  solvable  in  space  S(n)  on  an 
ATM,  and  DTIME(F(n))  is  the  class  of  problems  solvable  in  time  P(n)  on  a 
deterministic  Turing  Machine.  As  a  consequence  of  this  result  and  our  following 
proof,  we  have  the  immediate  result  that  GPSG-Recognition  is  DTIME[cs *n*)- 
hard,  for  all  constants  c,  or  Exp-Poly  time-hard. 

The  nodes  of  the  ATM  computation  tree  are  represented  by  syntactic  cate¬ 
gories  in  K°  —  one  feature  for  every  tape  square,  plus  three  features  to  encode 
the  ATM  tape  head  positions  and  the  current  state.  The  reduction  is  limited 
to  specifying  a  polynomial  number  of  features  in  polynomial  time;  since  these 
features  are  used  to  encode  the  ATM  tape,  the  reduction  may  only  specify 
polynomial  space  bounded  ATM  computations. 

The  ID  rules  encode  the  ATM  NextjuO  relation,  i.e.  C  -*  NextM(C)  for 
a  universal  configuration  C.  The  reduction  constructs  an  ID  rule  for  every 
combination  of  possible  head  position,  machine  state,  and  symbol  on  the  scanned 
tape  square.  Principles  of  universal  feature  instantiation  transfer  the  rest  of  the 
instantaneous  description  (i.e.  contents  of  the  tape)  from  mother  to  daughters 
in  ID  rules. 

Let  Nextv(G)  =  {Co,  Ci, . . . ,  Ck}.  If  C  is  a  universal  configuration,  then 
we  construct  an  ID  rule  of  the  form 

C  -+Co,Cu...,Ck  (6) 

Otherwise,  C  is  an  existential  configuration  and  we  construct  the  k  4-  1  ID 
rules 


C-C,  Vi,  0  <i<k  (7) 

A  universal  ATM  configuration  is  labeled  accepting  if  and  only  if  it  has  halted 
and  accepted,  or  if  all  of  its  daughters  are  labeled  accepting.  We  reproduce  this 
with  the  ID  rules  in  6  (or  8),  which  will  be  admissible  only  if  all  subtrees  rooted 
by  the  RHS  nodes  are  also  admissible. 

An  existential  ATM  configuration  is  labeled  accepting  if  and  only  if  it  has 
halted  and  accepted,  or  if  one  of  its  daughters  is  labeled  accepting.  We  reproduce 
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this  with  the  ID  rules  in  7  (or  9),  which  will  be  admissible  only  if  one  subtree 
rooted  by  a  RHS  node  is  admissible. 

All  features  that  represent  tape  squares  are  declared  to  be  in  the  HEAD 
feature  set,  and  all  daughter  categories  in  the  constructed  ID  rules  are  head 
daughters,  thus  ensuring  that  the  Head  Feature  Convention  (HFC)  will  transfer 
the  tape  contents  of  the  mother  to  the  daughter(s),  modulo  the  tape  writing 
activity  specified  by  the  next  move  relation. 

Details. 

Let 

Resulted  (»,  a,  d)  = 

(  [[HEADO  t  +  1],  [*  a],  \A  1]]  if  d  =  R 

\  [[HEADO  i  -  1],  [t  a],  [A  1]]  if  d  =  L 

Result l^(y,  c,  p,  d)  = 

(  [[HEAD1  y+l],[r,  c][STATE  p][  if  d  =  R 

\  [[HEAD1  j  -  lj.fr,  cj [STATE  p][  if  d  =  L 

Transw (<?,<*,  6)  =  {(p,c,di,d7)  :  ((?,o,6>,  (p,c,di,d7))  €  5} 

where 

a  is  the  read-only  (R/O)  tape  symbol  currently  be¬ 
ing  scanned 

h  is  the  read-write  (R/W)  tape  symbol  currently  be¬ 
ing  scanned 

di  is  the  R/O  tape  direction 
d7  is  the  R/W  tape  direction 

The  GPSG  G  contains: 

1.  Feature  definitions 

A  category  in  K°  represents  a  node  of  an  ATM  computation  tree,  where 
the  features  in  Atom  encode  the  ATM  configuration.  Labeling  is  per¬ 
formed  by  ID  rules. 
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(a)  definition  of  F,  Atom ,  A 

F  =  Atom  =  {STATE  ,  HEADO  ,  HEAD1  ,  A} 

U  {t  :  0  <  *  <|to|  +1} 

U  {r,  :  1  <  j<S(\w\)} 

A  =  <3  U  E  U  T  ;  as  defined  earlier 

(b)  definition  of  p° 

P°(A)  =  {1,2,3} 

p°(STATE  )  =  Q  ;  the  ATM  state  set 

p°(HEADO  )  =  O' :  1  <  j  <|w|} 

p°(HEADl  )  =  {*  :  1  <  *  <  5(|to|)> 

v/e{*:0<t<H+i} 

p°(/)  =  EU  {$}  ;  the  ATM  input  alphabet 

V/  e  {T,  :l<j<  5(M)} 

p°(/)  =  r  ;  the  ATM  tape  alphabet 

(c)  definition  of  HEAD  feature  set 

HEAD  =  {*  :  0  <  i  <|u>|  +1}  U  {r,  :  1  <  j  <  S(M)} 

(d)  FCRs  to  ensure  full  specification  of  all  categories  except  null  ones. 

V/,  /  €  Atom,  [STATE  ]  3  [/) 

2.  Grammatical  rules 


V»,  j,q,a,b  :  1  <  i  <|t»|,  1  <  j  <  S’du'l),  q  €  Q,  a€E,  6  €  T 
if  Trans  a/ (9,  a,  6)  /  0,  construct  the  following  ID  rules. 

(a)  if  q  e  U  (universal  state) 

{[HEADO  *),[*•  aj,  [HEAD1  j  J,  [Ty  tj,  [STATE  q],  [A  1)}-+ 
{ResultOw(t',a,difc)  UResultlji/O'.Ck.pfc.djfc)  :  (8) 

(Pk,ck,dik,d2k)  S  Transw(?,M)} 

where  all  categories  on  the  RHS  are  heads. 

(b)  otherwise  q  €  Q  —  V  (existential  state) 

V(pfc,  Cfc,  dik,  d2k)  e  TransM(g,o,6), 

{(HEADO  *],  [s'  oj,  [HEAD1  7],  [r,  b],  [STATE  ?[, \A  1]}  -  (9) 

ResultOM(»,a,dJfc)  U  ResultlM(y,Cfc,pfc,d2fc) 

where  all  categories  on  the  RHS  are  heads. 
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(c)  One  ID  rule  to  terminate  accepting  states,  using  null-transitions. 

{(STATE  h],jl  Y]}  -*(  (10) 

(d)  Two  ID  rules  to  read  input  strings  and  begin  the  ATM  simulation. 
The  A  feature  is  used  to  separate  functionally  distinct  components 
of  the  grammar.  [A  1]  categories  participate  in  the  direct  ATM 
simulation,  [A  2]  categories  are  involved  in  reading  the  input  string, 
and  the  [A  3]  category  connects  the  read  input  string  with  the  ATM 
simulation  start  state. 


START  — *  {[A  l]},  {[A  2]} 

{[A  2|}-{[A  2]},  {(A  2]} 
where  all  daughters  are  head  daughters,  and  where 

START  =  {(HEADO  lj,  [HEAD1  1),  [STATE  s|,  [A  3)} 

u  {|ry  #]:  1  <;<  5(H)} 

(e)  the  lexical  rules, 


Vo, »  a  6  E,  1  <  i  <|to|> 

<  at,  (|A  2),  (*'  a]}  > 

Vt  0<  »<H+i. 

<  $»,{[A  2],  (*  $]}  > 


(12) 


The  reduction  plainly  may  be  performed  in  polynomial  time  in  the  size  of 
the  simulated  ATM,  by  inspection. 

No  metarules  or  LP  statements  are  needed,  although  metarules  could  have 
been  used  instead  of  the  Head  Feature  Convention.  Both  devices  are  capable  of 
transferring  the  contents  of  the  ATM  tape  from  the  mother  to  the  daughter(s). 
One  metarule  would  be  needed  for  each  tape  square/tape  symbol  combination 
in  the  ATM. 

CKPS  Definition  5. 14  of  Admissibility  guarantees  that  admissible  trees  must 
be  terminated.12  By  the  construction  above  —  see  especially  the  ID  rule  10  — 

1  *  The  admissibility  of  nonlocal  trees  is  defined  as  follows  (GKPS,  p.104): 


Definition:  Admissibility 

Let  R  be  a  set  of  ID  rules.  Then  a  tree  t  is  admiuible  from  R  if  and  only  if 
1  <  is  terminated,  and 

2.  every  local  subtree  in  f  is  either  terminated  or  locally  admissible  from  some 
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an  [A  1]  node  can  be  terminated  only  if  it  is  an  accepting  configuration  (i.e. 
it  has  halted  and  printed  Y  on  its  first  square).  This  means  the  only  admissible 
trees  are  accepting  ones  whose  yield  is  the  input  string  followed  by  a  very  long 
empty  string. Q.C .V 

3.3  Sources  of  Intractability 

The  two  sources  of  intractability  in  GPSG  theory  spotlighted  by  this  reduction 
are  null-transitions  in  ID  rules  (see  the  ID  rule  10  above),  and  universal  feature 
instantiation  (in  this  case,  the  Head  Feature  Convention). 

Grammars  with  unrestricted  null-transitions  can  assign  elaborate  phrase 
structure  to  the  empty  string,  which  is  linguistically  undesirable  and  compu¬ 
tationally  costly.  The  reduction  must  construct  a  GPSG  G  and  input  string  x 
in  polynomial  time  such  that  x  €  L(G)  iff  w  6  L(M),  where  M  is  a  PSPACE- 
bounded  ATM  with  input  to.  The  ‘polynomial  time’  constraint  prevents  us 
from  making  either  x  or  G  too  big.  Null-transitions  allow  the  grammar  to 
simulate  the  PSPACE  ATM  computation  (and  an  Exp-Poly  TM  computation 
indirectly)  with  an  enormously  long  derivation  string  and  then  erase  the  string. 
If  the  GPSG  G  were  unable  to  erase  the  derivation  string,  G  would  only  accept 
strings  which  were  exponentially  larger  than  M  and  w,  i.e.  too  big  to  write 
down  in  polynomial  time. 

The  Head  Feature  Condition  transfers  HEAD  feature  values  from  the  mother 
to  the  head  daughters  just  in  case  they  don’t  conflict.  In  the  reduction  we  use 
HEAD  features  to  encode  the  ATM  tape,  and  thereby  use  the  HFC  to  transfer 
the  tape  contents  from  one  ATM  configuration  C  (represented  by  the  mother)  to 
its  immediate  successors  Co, (the  head  daughters).  The  configurations 
C,  C0,  have  identical  tapes,  with  the  critical  exception  of  one  tape  square. 

If  the  HFC  enforced  absolute  agreement  between  the  HEAD  features  of  the 
mother  and  head  daughters,  we  would  be  unable  to  simulate  the  PSPACE  ATM 
computation  in  this  manner. 


4  Interpreting  the  Result 

4.1  Generative  Power  and  Computational  Complexity 

At  first  glance,  a  proof  that  GPSG-Recognition  is  Exp-Poly  hard  appears  to 
contradict  the  fact  that  context-free  languages  can  be  recognised  in  0(n3)  time 
by  a  wide  range  of  algorithms.  To  see  why  there  is  no  contradiction,  we  must  first 
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explicitly  state  the  argument  from  weak  context-free  generative  power,  which 
we  dub  the  efficient  payability  (EP)  argument. 

The  EP  argument  states  that  any  GPSG  can  be  converted  into  a  weakly 
equivalent  context-free  grammar  (CFG),  and  that  CFG-Recognition  is  polyno¬ 
mial  time;  therefore,  GPSG-Recognition  must  also  be  polynomial  time.  The 
EP  argument  continues:  if  the  conversion  is  fast,  then  GPSG-Recognition  is 
fast,  but  even  if  the  conversion  is  slow,  recognition  using  the  “compiled”  CFG 
will  still  be  fast,  and  we  may  justifiably  lose  interest  in  recognition  using  the 
original,  slow,  GPSG. 

The  EP  argument  is  misleading  because  it  ignores  both  the  effect  conversion 
has  on  grammar  size,  and  the  effect  grammar  size  has  on  recognition  speed.  Cru¬ 
cially,  grammar  size  affects  recognition  time  in  all  known  algorithms,  and  the 
only  grammars  directly  usable  by  context-free  parsers,  i.e.  with  the  same  com¬ 
plexity  as  a  CFG,  are  those  composed  of  context-free  productions  with  atomic 
nonterminal  symbols.  For  GPSG,  this  is  the  set  of  admissible  local  trees,  and 
this  set  is  astronomical: 


0((3m!m’m+‘)  (13) 

in  a  GPSG  G  of  size  m.13 

Context-free  payers  like  the  Earley  algorithm  run  in  time  0(|  G'  |2  n3) 
where  |G' |  is  the  size  of  the  CFG  G'  and  n  the  input  string  length,  so  a  GPSG 
G  of  size  m  will  be  recognized  in  time 


0(32m!m””+‘  n3)  (14) 

The  hyper-exponential  term  will  dominate  the  Earley  algorithm  complexity 
in  the  reduction  above  because  m  is  a  function  of  the  size  of  the  ATM  we  are 

l3A»  we  saw  above,  the  metarule  finite  closure  operation  can  increase  the  ID  rule  grammar 
size  from  |  ft|=  0(|G|)  to  0{mJm)  in  a  GPSG  G  of  size  m.  We  ignore  the  effects  of  ID/LP 
format  on  the  number  of  admissible  local  trees  here,  and  note  that  if  we  expanded  out  all 
admissible  linear  precedence  possibilities  in  FC(M,R),  the  resultant  'ordered’  ID  rule  grammar 
would  be  of  size  0(m2”*!).  In  the  worst  case,  every  symbol  in  FC(M,R)  is  underspecified,  and 
every  category  in  K  extends  every  symbol  in  the  FC(M,R)  grammar.  Since  there  are 

0(3m  ml) 

possible  syntactic  categories,  and  0(m3m)  symbols  in  FC(M,R),  the  number  of  admissible 
local  trees  (=  atomic  context-free  productions)  in  G  is 


0((3m  ml  )*,’">)  _  0(3  mlm”"+‘) 

i.e.  astronomical.  Ristad(1986)  argues  that  the  minimal  set  of  admissible  local  trees  in  GKPS' 
GPSG  for  English  is  considerably  smaller,  yet  still  contains  more  than  10s’  local  trees. 
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simulating.  Even  if  the  GPSG  is  held  constant,  the  stunning  derived  grammar 
sise  in  formula  13  turns  up  as  an  equally  stunning  ‘constant’  multiplicative 
factor  in  14,  which  in  turn  will  dominate  the  real-world  performance  of  the 
Earley  algorithm  for  all  expected  inputs  (Le.  any  that  can  be  written  down  in 
the  universe),  every  time  we  use  the  derived  grammar.14 

Pullum(1985)  has  suggested  that  “examination  of  a  suitable  ‘typical’  GPSG 
description  reveals  a  ratio  of  only  4  to  1  between  expanded  and  unexpanded 
grammar  statements,”  strongly  implying  that  GPSG  is  efficiently  processable  as 
a  consequence.15  But  this  “expanded  grammar*  is  not  adequately  expanded,  i.e. 
it  is  not  composed  of  context-free  productions  with  unanalysable  nonterminal 
symbols.16  These  informal  tractability  arguments  are  a  particular  instance  of 
the  more  general  EP  argument  and  are  equally  misleading. 

The  preceding  discussion  of  how  intractability  arises  when  converting  a 
GPSG  into  a  weakly  equivalent  CFG  does  not  in  principle  preclude  the  ex¬ 
istence  of  an  efficient  compilation  step.  If  the  compiled  grammar  is  truly  fast 
and  assigns  the  same  structural  descriptions  as  the  uncompiled  GPSG,  and  it 
is  possible  to  compile  the  GPSG  in  practice,  then  the  complexity  of  the  univer¬ 
sal  recognition  problem  would  not  accurately  reflect  the  real  cost  of  parsing.17 

14 The  compiled  grammar  recognition  problem  is  at  least  as  intractable  as  the  uncompiled 
one.  Even  worse,  Barton(l985)  shows  how  the  grammar  expansion  increases  both  the  space 
and  time  costs  of  recognition,  when  compared  to  the  cost  of  using  the  grammar  directly. 

15 This  substantive  argument  does  not  appear  to  fit  in  with  the  GKPS  goal  of  a  purely  formal 
investigation  of  linguistics:  “The  universalism  [of  natural  language]  is,  ultimately,  intended  to 
be  entirely  embodied  in  the  formal  system,  not  expressed  by  statements  made  in  it."GKPS(4). 
It  is  difficult  to  respond  precisely  to  the  claims  made  in  Pullum(1985),  since  the  abstract  is 
(necessarily)  brief  and  consists  of  assertions  unsupported  by  factual  documentation  or  clari¬ 
fying  assumptions. 

14  “Expanded  grammar”  appears  to  refer  to  the  output  of  metarule  finite  closure  (i.e.  ID 
rules),  and  this  expanded  grammar  is  tractable  only  if  the  grammar  is  directly  usable  by  the 
Earley  algorithm  exactly  as  context-free  productions  are:  all  nonterminals  in  the  context-free 
productions  must  be  unanalysable.  But  the  categories  and  ID  rules  of  the  metarule  finite 
closure  grammar  do  not  have  this  property.  Nonterminals  in  GPSG  are  decomposable  into 
a  complex  set  of  feature  specifications  and  cannot  be  made  atomic,  in  part  because  not  all 
extensions  of  ID  rule  categories  are  legal.  For  example,  the  categories  VPOIHV,  VFORM  PAS] 
and  VP[*IHV,  VFORM  FIB]  are  not  legal  extensions  of  VP  in  English,  while  VP[«IHV,  ♦AUX, 
VFORM  FIR]  is.  FCRs,  FSDs,  LP  statements,  and  principles  of  universal  feature  instantiation 
—  all  of  which  contribute  to  GPSG’s  intractability  —  must  all  still  apply  to  the  rules  of  this 
expanded  grammar. 

Even  if  we  ignore  the  significant  computational  complexity  introduced  by  the  machinery 
mentioned  in  the  previous  paragraph  (i.e.  theory  of  syntactic  features,  FCRs,  FSDs,  ID/LP 
format,  null-transitions,  and  metarules),  GPSG  will  Ml  not  obtain  an  efficient  portability  result 
This  is  because  the  Head  Feature  Convention  alone  ensures  that  the  universal  recognition 
problem  for  GPSGs  will  be  NP-hard  and  likely  to  be  intractable.  Ristad(1986)  contains 
a  proof.  This  result  should  not  be  surprising,  given  that  (1)  principles  of  universal  feature 
instantiation  in  current  GPSG  theory  replace  the  metarules  of  earlier  versions  of  GPSG  theory, 
and  (2)  metarules  are  known  to  cause  intractability  in  GPSG. 

1TThe  existence  or  nonexistence  of  efficient  compilation  functions  does  not  affect  either  our 
scientific  interest  in  the  universal  grammar  recognition  problem  or  the  power  and  relevance  of 
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But  until  such  a  suggestion  is  forthcoming,  we  must  assume  that  it  does  not 
exist.18, 19 


4.2  Complexity  and  Succinctness 

The  major  complexity  result  of  this  paper  proves  that  the  fastest  algorithm  for 
GPSG-Recognition  must  take  more  than  exponential  time.  The  immediately 
preceding  section  demonstrates  exactly  how  a  particular  algorithm  for  GPSG- 
Recognition  (the  EP  argument)  comes  to  grief:  weak  context-free  generative 
power  does  not  ensure  efficient  payability  because  a  GPSG  G  is  weakly  equiva¬ 
lent  to  a  very  large  CFG  G',  and  CFG  size  affects  recognition  time.  The  rebut¬ 
tal  does  not  suggest  that  computational  complexity  arises  from  representational 
succinctness,  either  here  or  in  general. 

Complexity  results  characterize  the  amount  of  resources  needed  to  solve 
instances  of  a  problem,  while  succinctness  results  measure  the  space  reduction 
gained  by  one  representation  over  another,  equivalent,  representation. 

There  is  no  casual  connection  between  computational  complexity  and  repre¬ 
sentational  succinctness,  either  in  practice  or  principle.  In  practice,  converting 
one  grammar  into  a  more  succinct  one  can  either  increase  or  decrease  the  recog¬ 
nition  cost.  For  example,  converting  an  instance  of  context-free  recognition 
(known  to  be  polynomial  time)  into  an  instance  of  context-sensitive  recognition 

a  complexity  analysis.  If  complexity  theory  classifies  a  problem  as  intractable,  we  learn  that 
something  more  must  be  said  to  obtain  tractability,  and  that  any  efficient  compilation  step, 
if  it  exists  at  all,  must  itself  be  costly. 

*®Note  that  the  GPSG  we  constructed  in  the  preceding  reduction  will  actually  accept 
any  input  z  of  length  less  than  or  equal  to  |  tu )  if  and  only  if  the  ATM  M  accepts  it  us¬ 
ing  5(|  iu  |)  space.  We  prepare  an  input  string  i  for  the  GPSG  by  converting  it  to  the 
string  $0zilij2  .  . .  x„n$n  +  1  e  g,  abadct  is  accepted  by  the  ATM  if  and  only  if  the  string 
$0alb2a3d4e5e6$7  is  accepted  by  the  GPSG.  Trivial  changes  in  the  grammar  allows  us  to 
permute  and  “spread"  the  characters  of  *  across  an  infinite  class  of  strings  in  an  unbounded 
number  of  ways,  e.g.  tO~ii  z.iqj  ...  7»iil'T6  Tntn+  1  where  each  q,  is  a  string  over  an 
alphabet  which  is  distinct  from  the  <7t  alphabet  Although  the  flexibility  of  this  construction 
results  in  a  more  complicated  GPSG,  it  argues  powerfully  against  the  existence  of  any  efficient 
compilation  procedure  for  GPSGs.  Any  efficient  compilation  procedure  must  perform  more 
than  an  exponential  polynomial  amount  of  work  fGPSG-Recognition  takes  at  least  Exp-Poly 
time)  on  at  least  an  exponential  number  of  inputs  (all  inputs  that  fit  in  the  |  w  |  space  of  the 
ATM’s  read-only  tape).  More  importantly,  the  required  compilation  procedure  will  convert 
any  exponential-polynomial  time  bounded  Turing  Machine  into  a  polynomial-time  TM  for  the 
class  inputs  whose  membership  can  be  determined  within  a  arbitrary  (fixed)  exp-poly  time 
bound.  Simply  listing  the  accepted  inputs  will  not  work  because  both  the  GPSG  and  TM  may 
accept  an  infinite  class  of  inputs  Such  a  compilation  procedure  would  he  extremely  powerful 

*®Note  that  compilation  illegitimately  assumes  that  the  compilation  step  is  free  There  is 
one  theory  of  primitive  language  learning  and  use:  conjecture  a  grammar  and  use  it  For  this 
procedure  to  work,  grammars  should  be  easy  to  test  on  small  inputs  The  overall  complexity 
of  learning,  testing,  and  speech  must  be  considered  Compilation  speeds  up  the  speech  com¬ 
ponent  at  the  expense  of  greater  complexity  in  the  other  two  components  For  this  linguistic 
reason  the  compilation  argument  is  suspect 
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(known  to  be  PSPACE-complete  and  likely  to  be  intractable)  can  significantly 
speed  the  recognition  problem  if  the  conversion  decreases  the  size  of  the  CFG 
logarithmically  or  better.  Even  more  strangely,  increasing  ambiguity  in  a  CFG 
can  speed  recognition  time  if  the  succinctness  gain  is  large  enough,  or  slow  it 
down  otherwise  —  unambiguous  CFGs  can  be  recognised  in  linear  time,  while 
ambiguous  ones  require  cubic  time. 

In  principle,  tractable  problems  may  involve  succinct  representations.  For 
example,  the  iterating  coordination  schema  (ICS)  of  GPSG  is  an  unbeatably 
succinct  encoding  of  an  infinite  set  of  context-free  rules;  from  a  computational 
complexity  viewpoint,  the  ICS  is  utterly  trivial  using  a  slightly  modified  Earley 
algorithm.20  Tractable  problems  may  also  be  verbosely  represented:  consider  a 
random  finite  language,  which  may  be  recognized  in  essentially  constant  time  on 
a  typical  computer  (using  a  hash  table),  yet  whose  elements  must  be  individually 
listed.  Similarly,  intractable  problems  may  be  represented  both  succinctly  and 
nonsuccinctly.  As  is  well  known,  the  Turing  machine  for  any  arbitrary  r.e.  set 
may  be  either  extremely  small  or  monstrously  big.  Winning  the  game  of  chess 
when  played  on  an  n  x  n  board  is  likely  to  be  computationally  intractable,  yet 
the  chess  board  is  not  intended  to  be  an  encoding  of  another  representation, 
succinct  or  otherwise. 

Tractable  problems  may  involve  succinct  or  nonsuccinct  representations,  as 
may  intractable  problems.  The  reductions  in  this  paper  show  that  GPSGs 
are  not  merely  succinct  encodings  of  some  context-free  grammars;  they  are 
inherently  complex  grammars  for  some  context-firee  languages.  The  heart  of 
the  matter  is  that  GPSG’s  formal  devices  are  computationally  complex  and  can 
encode  provably  intractable  problems. 

4.3  Relevance  of  the  Result 

In  this  paper,  we  argued  that  there  is  nothing  in  the  GPSG  formal  framework 
that  guarantees  computational  tractability:  proponents  of  GPSG  must  look  else¬ 
where  for  an  explanation  of  efficient  payability,  if  one  is  to  be  given  at  all. 
The  crux  of  the  matter  is  that  the  complex  components  of  GPSG  theory  in¬ 
teract  in  intractable  ways,  and  that  weak  context-free  generative  power  does 
not  guarantee  tractability  when  grammar  size  is  taken  into  account.  A  faithful 
implementation  of  the  GPSG  formalisms  of  GKPS  will  provably  be  intractable; 
expectations  computational  linguistics  might  have  held  in  this  regard  are  not 
fulfilled  by  current  GPSG  theory. 

This  formal  property  of  GPSGs  is  straightforwardly  interesting  to  GPSG 

J0A  more  extreme  example  of  the  unrelatednes*  of  euccinctnete  and  complexity  is  the  ab¬ 
solute  succinctness  with  which  the  dense  language  E*  mt;  be  represented  —  whether  by  a 
regular  expression,  CFG,  or  even  Turing  machine  —  yet  members  of  E*  may  be  recognised 
in  constant  time  (i.e.  always  accept) 
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linguists.  As  outlined  by  GKPS,  “an  important  goal  of  the  GPSG  approach  to 
linguistics  [is]  the  construction  of  theories  of  the  structure  of  sentences  under 
which  significant  properties  of  grammars  and  languages  fall  out  as  theorems  as 
opposed  to  being  stipulated  as  axioms  (p.4).” 

The  role  of  a  computational  analysis  of  the  sort  provided  here  is  fundamen¬ 
tally  positive:  it  can  offer  significant  formal  insights  into  linguistic  theory  and 
human  language,  and  suggest  improvements  in  linguistic  theory  and  real-world 
parsers.  The  insights  gained  may  be  used  to  revise  the  linguistic  theory  so  that 
it  is  both  stronger  linguistically  and  weaker  formally.  Work  on  revising  GPSG  is 
in  progress.  Briefly,  some  proposed  changes  suggested  by  the  preceding  reduc¬ 
tions  are:  unit  feature  closure,  no  FCRs  or  FSDs,  no  null-transitions  in  ID  rules, 
metarule  unit  closure,  and  no  problematic  feature  specifications  in  the  princi¬ 
ples  of  universal  feature  instantiation.  Not  only  do  these  restrictions  alleviate 
most  of  GPSG’s  computational  intractability,  but  they  increase  the  theory’s 
linguistic  constraint  and  reduce  the  number  of  nonnatural  language  grammars 
licensed  by  the  theory.  Unfortunately,  there  is  insufficient  space  to  discuss  these 
proposed  revisions  here  —  the  reader  is  referred  to  Ristad(1986)  for  a  complete 
discussion. 
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